%HPC TFET

\subsection{Extending the boundaries of high performance computing with steep-slope devices}


Emerging devices expand the regions of system design space considered both plausible for mass deployment, in terms of manufacturability as well as operating conditions, and preferable in terms of performance, power and cost. 
The technique of 3D integration is rapidly maturing ~\cite{ionescu-3D} and we seek to optimize the benefits offered by processor designs comprising steep-slope devices, like HTFETs~\cite{mookerjea,seabaugh,dac11}.

Although steep slope device-based processors, are noted for their low voltage and power efficient operation, they have an important role to play in the high performance computing domain as well.
Both 3D integration and TFET designs offer the potential to extend the maximum number of aggressive cores possible within a viable yield and thermal budget. 
By adopting 3D designs, it is possible to ameliorate problems with yield and thermal limitations that affect current manycore designs.

It is known that yield decreases super-linearly with increase in
area~\cite{yibo-yield-iccad}, and communication costs among cores scale poorly in planar designs~\cite{reetu-3d-cost}. 
As a result, transitioning to 3D integration offers a very direct means to achieve meaningfully higher core counts in tightly integrated systems. 
However, this aggravates thermal limitations due to the huge increase in power density, since there are additional heat sources and insulators between the cooling system and lower layers in the chip stack. 
Hence it becomes necessary to operate the cores at extremely low frequencies to ensure that the power dissipated and, consequently, the power and thermal density, remain within acceptable limits.
Due to the ability of steep-slope devices to offer fundamental reductions in leakage currents and switching energy at lower operating frequencies, adopting steep-slope processor designs in place of CMOS will allow more layers within the same thermal budget. This allows 3D TFET based designs to scale to sufficient parallelism to overcome limitations in the serial performance of TFET based processors. 

In such a scenario, there are several considerations, such as determining the optimal multiprocessor microarchitecture and operating conditions, and obtaining the `sweet spot' between the yield and thermal constraints in terms of manufacturing cost, performance and power efficiency.


\subsection{Yield and thermal aware processor design}
On account of the super-linear decrease in yield with increase in die area, it is not viable to increase the number of cores on a single layer of a chip beyond a certain point. 
3D stacking~\cite{hanada-3d} of multiple layers on chip has been demonstrated to be beneficial in increasing overall performance due to increased bandwidth and shorter wire lengths.
In addition, it is also possible to obtain a much higher yield in comparison to an iso-area 2D design by reducing the area footprint per layer.
However, there are losses occurring as a result of joining 2 layers together, quantified by the bonding yield. As a result there is a trade-off between increasing
the die size and increasing the number of layers~\cite{yibo-yield-iccad}.

The extra die area that 3D stacking provides makes it possible to improve the overall chip yield by introducing redundant cores. This is known as core sparing and is commonly
used as a technique to improve the overall yield of several processors in industry~\cite{emma-3d}.
The improvement in yield significantly shortens the time to market for these processors, which makes up for the additional hardware resources required for the core redundancy. 
This is a far more viable alternative than aiming to improve the fabrication process both from a time and cost perspective. 
Both the area footprint and the number of layers cause the proportion of redundant cores to increase. 
For smaller chip areas the losses due to bonding multiple layers together dominates. 
However, as the area per layer increases, the yield decreases at a faster rate and folding the cores to stack them in multiple layers can arrest this decline. 

\begin{figure}[ht!]
  \centering
    \epsfig{file=figs/perf_3d.eps, angle=0, width=1\linewidth, clip=}
    \caption{\footnotesize\label{fig:perf-3d} CMOS and TFET multicore speedup (normalized to single core CMOS configuration) for a server system comprising of a maximum of 64 cores.}
\end{figure}

Figure~\ref{fig:perf-3d} shows the overall speedup for CMOS and TFET multicore systems comprising stacked cores, under yield and thermal limitations. For the purpose of this study we consider a thermal limit of around 85-90$^{\circ}$C (358-363K), which is an acceptable thermal range for server-based systems. 
Although all cores have similar microarchitecture configurations (4-issue Ivybridge-like architecture), we observe that the superior thermal efficiency of TFET cores enables a larger number of cores to function within the thermal limit, resulting in a 22\% speedup over an equivalent CMOS configuration. Also the idea of core-sparing enables us to employ a configuration which requires only 64 functioning cores from a maximum of 128 cores.
